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Abstract 


Previous studies have found several distinct alleles at both levels of transcriptional activity and 
protein-DNA binding manners in breast cancer patients vs. healthy individuals through multi-step 
experimental approaches. This study presents a computational-based model to investigate the regulatory 
potential and functional properties of disease-related non-coding single nucleotide polymorphisms (SNPs) 
variants through several online in silico tools in the Iranian population. The association between the risk 
of breast cancer and its putative single nucleotide polymorphisms in the Iranian population was 
investigated through SNPedia database and genome-wide association studies (GWAS). Furthermore, a 
meta-analysis was performed by Comprehensive Meta-Analysis (CMA) software. Functional analyses 
were carried out through LDlink, HaploReg, and RegulomeDB. The impact of each single nucleotide 
polymorphism on gene expression profiles and transcription factor binding sites were predicted by the 
RegulomeDB. "5", "6", and "Id" scores were assigned to rs3746444, rs1062577, and rs1049174 by this 
scoring system, respectively. RegulomeDB scores of rs3746444-MYH7B/MIR499A and rs1062577-ESRI 
suggested that they are not putative functional single nucleotide polymorphisms; and may not associate 
with significant eQTL signals. The “1d” score for rs1049174-RPJ1-277P12.20 confirmed an association 
with the expression of the target gene. Proxy variants rs6088678 and rs2617160 have been identified using 
LDlink in non-coding segments. They were in strong linkage disequilibrium (LD) with single nucleotide 
polymorphisms rs3746444 and rs1049174, respectively. Also, non-coding variants rs6088678-TRPC4AP 
and rs2617160- RP/1-277P12.20 with high-ranked scores showed the strongest related-expression. This 
work provides a rapid and direct in silico-based approach for the identification of functional genetic 
variants in the breast cancer. These analyses were conducted to evaluate the association of intended SNPs 
with the regulatory elements of histones, DNases, motif changes, and selected eQTL signals. It can be 
extended to some other complex single nucleotide polymorphism-related diseases. 


Keywords: Epigenetics, Functional single nucleotide polymorphisms, Genome-Wide Association Studies, Linkage 
Disequilibrium, RegulomeDB scoring system. 


Introduction 


Single Nucleotide Polymorphisms (SNPs) 
represent the most common markers of the genome 
diversity among individuals (Coetzee et al., 2012). 
The overwhelming majority of significantly 
associated genetic variants identified through 
GWAS were drop down outside of the coding area. 
Hence, it is difficult to understand how specific SNP 
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increases disease susceptibility (Meng et al., 2018). 
Single nucleotide polymorphisms have a crucial role 
in the prediction of the risk of various complicated 
diseases including cancer ( Seyedmir et al., 2017). 
Cis-regulatory regions (non-coding DNA regions) 
comprise distal elements such as _ promoters, 
enhancers, and _ insulators, which regulate 
transcriptional activities and complex spatial and 
temporal gene expressions following the binding of 
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transcription factors ( Bauer-Mehren et al., 2009). 

In addition, the majority of epigenetic 
changes may be reversible or preventable. So, the 
restoration of epigenetic changes could be applied as 
a proper strategy for cancer treatment or prevention 
( Coetzee et al., 2012). There are highly advanced 
web-based tools with the capacity for the annotation 
of a specific SNP to a target gene. Also, it is possible 
to measure the causal risk among numerous non- 
coding loci before performing time-consuming 
validation experiments. Such experiments will 
enable us to accurately predict the likelihood of 
particular cancer risk for individuals or communities 
( Coetzee et al., 2012). These types of studies are 
based on two hypotheses: I) alterations in the 
regulatory areas are major determinants of gene 
expression modifications. I1) motifs in regulatory 
regions exhibit a location preference (e.g. at the 
center of H3K27ac, H3K4mel or DNase peak ( 
Meng et al., 2018). It has been observed 
that the chromatin status of enhancers is determined 
by highly specific histone modification patterns 
which are strongly linked to cell-type-specific gene 
expression programs on a global scale.( Heintzman 
et al., 2009) Along with H3K4mel1, a general signal 
for enhancers, H3K27ac enrichment is also 
dedicated to the identification of active enhancers. 
Sequences with high H3K4mel1 enrichment, and low 
H3K27ac are considered as_ ready-to-activate 
enhancers and are associated with low gene 
expression levels ( Rhie et al., 2013). Hence, in the 
present study, in line with these kind of experiments, 
a comprehensive in silico study was conducted based 
on the application of computational-based methods 
including RegulomeDB, HaploReg, and LDlink. 
Encyclopedia of DNA Elements (ENCODE, from 
ChIP-seq experiments), and Roadmap Epigenomics 
(from methods such as ChromHMM) were utilized 
as data resources (Edwards et al., 2013). We selected 
breast cancer as the phenotype of choice among 
others during genome-wide association studies 
(GWAS). A list of three breast cancer risk- 
associated SNPs was obtained from the GWAS 
Catalog and SNPedia. Our purpose was to determine 
the functional value of rs3746444, rs1062577, and 
rs1049174 SNPs which were obtained through wet- 
lab experiments in the Iranian population. 

Indeed, we performed pairwise comparisons 
with functional proxy variants suggested by the 
LDlink application. LDlink 
(analysistools.nci.nih.gov) is a web-based 
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application for exploring population-specific 
haplotype structure and linking correlated alleles of 
possible functional variants (Machiela et al., 2015). 
Due to the importance of linkage disequilibrium 
(LD) structures in the indigenous populations, 
LDlink was utilized for finding two proxy variants 
to be compared with query variants in the Iranian 
population. We found coding proxy variants with 
high RegulomeDB scores which do not have any 
functional effect on regulatory regions. On the other 
side, a non-coding proxy variant with low 
RegulomDB score was selected. RegulomeDB 
variant classification scheme is fully described by 
Boyle et al. (Boyle et al., 2012). 

HaploReg v4.1 is another web-based tool for 
exploring the annotations and _ producing 
mechanistic hypotheses of the impact of noncoding 
variants on the clinical phenotypes and normal 
variations ( Fayez et al., 2018). 


Materials and Methods 


Selection of SNPs 

In the present study, SNPs associated with breast 
cancer risk in the Iranian population were identified 
through the SNPedia (www.snpedia.com) and 
GWAS Catalog (www.ebi.ac.uky»gwas). These 
detected SNPs include rs3746444 (Kabirizadehet al., 
2015), ( Jiang et al., 2015), ( Zou et al., 2012), (Mu 
et al., 2017), (Wang et al., 2012), ( Wang et al., 
2012), rs1062577( Dehghan et al., 2017), (Chen et 
al., 2016), and rsl049174 (Ghobadzadeh et al., 
2013). Different parameters including odds ratios 
(OD), confidence interval (CI), number of samples, 
author’s name, and host countries for these SNPs 
were extracted from relevant literature to conduct a 
comprehensive meta-analysis. The best and most 
effective SNPs were selected for downstream 
procedures. 


In-silico studies 
LDlink (www.ldlink.nci.nih.gov), HaploReg 
(www.pubs.broadinstitute.org/mammals/haploreg/h 


aploreg.pbp) and RegulomeDB 
(http://www.regulomedb.org) web tools and 


databases were applied to determine the functional 
value of desired polymorphisms. Figure 1 is 
outline of our processing pipeline. 
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Figure 1. The pipeline consists of various key points including methods of SNP collection (SNPedia and GWAS 
Catalog), comprehensive meta-analysis (CMA), investigating the patterns of linkage disequilibrium across a variety of 
ancestral populations (LDlink), and developing the mechanistic hypothesis of the impact of non-coding variants on the 
clinical phenotypes (RegulomeDB and HaploReg). At the final step, we endeavored to confirm whether these SNPs 
are located in the regulatory segments and have functional impact on the gene expression patterns (Putative Functional 
SNP). 

SNPedia: wiki-based bioinformatics web site that serves as a database of single nucleotide polymorphisms (SNPs). 
NHGRI-EBI GWAS Catalog: publicly available resource of Genome Wide Association Studies (GWAS) and their 
results. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated 
alleles of possible functional variants. HaploReg: a tool for exploring annotations of the noncoding genome at variants 
on haplotype blocks, such as candidate regulatory SNPs at disease associated loci. RegulomeDB: a database that 
annotates SNPs with known and predicted regulatory elements in the intergenic regions of the H. sapiens genome. 
Known and predicted regulatory DNA elements include regions of DNase hypersensitivity, binding sites of 
transcription factors, and promoter regions that have been biochemically characterized to regulation transcription. 
Sources of these data include public datasets from GEO, the ENCODE project, and published literature. Query SNP: 
variant RS number - RS number for query variant. RS number must match a bi-allelic variant. Table of proxy variants: 
by default, the ten variants with the highest R2 values and closest distance to the query variant are displayed. External 
links lead to the variant RS number in dbSNP, coordinates in the UCSC Genome Browser, and regulatory information 
(if any) in RegulomeDB. 


The LDlink web tool was used to detect proxy SNPs HaploReg was used to explore the annotations of the 
with strong LD (> 0.8) for rs3746444, rs1062577, non-coding genome at variants on haplotype blocks, 
and rs1049174. Proxy SNPs with these properties such as candidate regulatory SNPs at disease-related 
were selected for further analysis: I) coding SNPs loci ( Hamdi et al., 2018). The impact of genetic 


with high RegulomeDB scores (4-6) and the least changes on different tissues and biological systems 
evidences for binding to regulatory proteins and was revealed using HaploReg, RegulomeDB, and 
participation in the gene expression regulation. ID) the genotype-tissue expression (GTEx) portal (Ward 


non-coding SNPs with low RegulomeDB score (la- et al., 2011). 
1f) and the most evidences for binding to regulatory 


proteins and participation in the gene expression Results 

regulation. In addition, LD hap option of the LDlink 

was applied to evaluate haplotype frequencies Identification of single nucleotide polymorphisms 
between input SNPs and proxy SNPs. The meta-analysis results using CMA software 
Histone modifications in human tissues relevant to are shown in the Table 1. The level of 1rs3746444 
the breast cancer, such as breast myoepithelial effect on the breast cancer predicted as "low". The 


primary cells (MEPs) and breast variant human interactive plot for rs3746444 represents the high 
mammary epithelial cells (WHMECs) (Fayez et al., density of SNPs with high LD around this variant 
2018) were investigated by the HaploReg v4.1 tool. (Figure 2). 
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Using LDlink web tool it was confirmed that two 
proxy SNPs (183746435 and rs6088678) with strong 
LD 0.87 with query SNP rs3746444, are associated 
with the breast cancer pathogenesis (Table 2). 
RegulomeDB score "5" was obtained for SNP 
183746435, located in the coding region. Meanwhile, 
based on this scoring system, low score of “lf” was 
measured for rs6088678 non-coding — variant, 
implying a higher level of functional properties 
(Table 2). 

It was confirmed by the application of HaploReg that 
183746444 induces histone modification 
H3K4mel1_Enh in the breast myoepithelial primary 
cells and is located in the DNase I hypersensitive 
region of the breast variant of human mammary 
epithelial cells. It shows that motif changes may 


allow the DNase I to identify the available chromatin 
and cuts DNA at its respective region (Table 3). 
Although, proxy SNP rs3746435 (missense) with 
score "5", did not results in any histone 
modifications in the examined breast cancer cell 
lines (Table 3). It was also shown based on the 
results from HaploReg tool that proxy SNP 
rs6088678 caused the histone modification 
H3K9ac_ Pro. It is located within the promoter 
region and transcription start sites (TSS) and was 
effective in the regulation of TRPC4AP expression 
(Table 3). Query SNPs or SNPs previously identified 
as breast cancer associated ones in the Iranian 
population are marked in bold. 


Proxies for rs3746444 in EAS+SAS 


33.4 


33.6 33.8 34 


Proxies for rs3746444 in EAS+SAS 


oN 
Figure 2. The interactive plot obtained for rs3746444(blue circle with RegulomeDB score=5) by the application of 
LDlink web tool. The complete and high-resolution chart could be viewed through the given link. 
(https://\dlink.nci.nih. gov/?var=rs3746444 &pop=CHB %2BJPT %2BCHS %2BCDX %2B KHV %2BGIH%2BPJL%2B 


BEB%2BSTU%2BITU&r2_d=r2&window=500000&tab=ldproxy). 


Interactive plot: interactive plot of query 


variant(rs3746444) and all bi-allelic dbSNP variants plus or minus 500 kilobases (Kb) of the query variant(rs3746444). 
X axis is the chromosomal coordinates and the Y axis is the pairwise R2 value with the query variant as well as the 
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combined recombination rate from HapMap. Each point represents a proxy variant and is colored based on function, 
sized based on minor allele frequency, and labeled based on regulatory potential (regulatory potential number of 
1s3746444=5). Hovering over the point will display detailed information on the query and proxy variants. Reference 
population(s)((SAS) South Asian and (EAS) East Asian): selected from the drop-down menu. At least one 1000 
Genomes Project sub-population is required, but more than one may be selected. R2/D' toggle: select if desired output 
is based on estimated R2 or D'. 


Table 1. Results from meta-analysis of association studies for rs3746444. 


Odds Lower | Upper Zz p 
ratios limit limit value | value 


Reference study 


Wang, Y., Yang, B. and Ren, X. (2012) Hsa-miR-499 polymorphism 
(rs3746444) and cancer risk: a meta-analysis of 17 case-control studies. 1.230 1.059 1.429 2.710 | 0.007 
Gene 509(2): 267-272. 

Mu, K., Wu, Z. Z., Yu, J. P., Guo, W., Wu, N., Wei, L. J. and Liu, J. T. 

(2017) Meta-analysis of the association between three microRNA 1.170 1.025 1.336 2.319 | 0.020 

polymorphisms and breast cancer susceptibility. Oncotarget 8(40): 68809. 
Wang, L., Qian, S., Zhi, H., Zhang, Y., Wang, B. and Lu, Z. (2012) The 

association between hsa-miR-499 T> C polymorphism and cancer risk: a 1.160 0.995 1.353 1.892 | 0.058 

meta-analysis. Gene 508(1): 9-14. 

Zou, P., Zhao, L., Xu, H., Chen, P., Gu, A., Liu, N. and Lu, A. (2012) Hsa- 
mir-499 rs3746444 polymorphism and cancer risk: a meta-analysis. Journal | 1.100 1.004 1.205 2.049 | 0.040 
of biomedical research 26(4): 253-259. 

Jiang, S. G., Chen, L., Tang, J. H., Zhao, J. H. and Zhong, S. L. (2015) 
Lack of association between Hsa-Mir-499 rs3746444 polymorphism and 
cancer risk: meta-analysis findings. Asian Pacific Journal of Cancer 
Prevention 16(1):339-344. 

Kabirizadeh, S., Azadeh, M., Mirhosseini, M., Ghaedi, K. and Tanha, H. 
M. (2016) The SNP rs3746444 within mir-499a is associated with breast 
cancer risk in Iranian population. Journal of Cellular Immunotherapy 2(2): 


1.180 1.035 1.346 2.466 | 0.014 


1.922 1.064 3.471 2.167 | 0.030 


95-97. 
1.157 1.094 1.223 5.149 | 0.000 
Data obtained by the LDlink revealed that significant biological activity such as alterations in 
183746444, which is located in a non-coding the transcription factors binding capacity and gene 
segment, indicated a RegulomeDB score "5" (Table regulatory effects in the Iranian population. 


2). It seems that rs3746444 does not exhibit any 


Table 2. Details of putative regulatory functions of query SNPs and their associated proxy SNPs. 


LD Enhancer dbSNP GEN A 
cVariant (r?) ree aan histone DNase func CODE Reeuomen” preaicted 
(D') freq score function 
marks annot genes 
BRST, os 
133746444 | 1 1 0.17 Pee BEN, Intronic Bene 5 OMe ae 
183746435 | 0.87 | 1.0 | 0.12 ae ae Missense | MYH7B 5 ONE a 
eQTL+TF 
rs6088678 | 0.87 1.0 0.12 8 tissues - Intronic TRPC4AP if binding / 
DNase peak 
1s 1062577 1 1 0.27 - - 3'-UTR ESR1 6 Motif hit 
eQTL+TF 
rs1049174 | 1 1 0.60 2 8 tissues | 3'-UTR eis en ld es 
peak 
. RP11- aT 
1s2617160 0.88 0.9 0.59 10 tissues | 5 tissues Intronic if binding / 
277P 12.20 
DNase peak 
A comparison of several factors between query and proxy variants with high LDs has been performed in the Asian 


opulation. 
ASN freq: allele abundance in Asian population. SNP functional annotation: the functional area where mentioned SNP 
is located. GENCODE genes: the gene region in which SNP is located. RegulomeDB score: Regulome DB is a database 
26 
http://jcmr.um.ac.ir 


Journal of Cell and Molecular Research (2020) 12 (1), 22-32 


that scores SNPs functionality based upon experimental data. It is necessary to mention that in all tables Query SNPs 


are displayed in bold. 


Table 3. Regulatory chromatin status from DNase and histone ChIP-Seq (Roadmap Epigenomics Consortium, 2015). 


variant Group Description H3K4mel H3K4me3 | H3K27ac H3K9ac DNase 
Breast 
183746444 | Epithelial Myoepithelial H3K4mel1_Enh - - - - 
Primary Cells 
Breast variant 
ae Human Mammary 
183746444 | Epithelial Epithelial Cells - - - DNase 
(vHMEC) 
Breast 
183746435 | Epithelial Myoepithelial - - - - 
Primary Cells 
183746435 | Epithelial VHMEC - - - - 
Breast 
1s6088678 | Epithelial Myoepithelial - - H3K9ac_Pro - 
Primary Cells 
rs1062577 | Epithelial VHMEC H3K4mel1_Enh - - - - 
Breast 
rs1049174 | Epithelial Myoepithelial H3K4mel1_Enh - - - - 
Primary Cells 
rs1049174 | Epithelial VHMEC H3K4mel1_Enh - - - - 
Breast 
1s2617160 | Epithelial Myoepithelial H3K4me1_Enh - - - - 
Primary Cells 
rs2617160 | Epithelial VHMEC H3K4mel1_Enh - - DNase 


Query SNPs have been compared with proxy SNPs in terms of cellular and histological position. Histone modifications 


that each one creates in the target cells has been investigated. 


Open chromatin: DNasel hypersensitivity. Histone modifications: H3K4mel, H3K4me3, H3K9ac, H3K27ac. It is 
necessary to mention that in all tables Query SNPs are displayed in bold. 


Increased frequency of haplotype AGC 

The LD hap analysis 
(http://analysistools.nci.nih.gov/LDlink/tab=Idhap) 
showed increased frequency (79%) of AGC 
haplotype among three SNPs including rs3746444, 
183746435, and rs6088678. Results indicated that 
when the query SNP is adenine, the proxy allele for 


RS Number Position (GRCh37) Allele Frequencies 


[S3746444 chr20:33578251 
1S3746435 chr20:33587198 
rs6088675 chr20:33607551 


A=0.795, G=0.205 
G=0.816, C=0.184 
C=0.816, T=0.184 


rs3746435 and rs6088678 will be G and C, 
respectively. There was a very strong LD among 
these three SNPs (87%). Also, the abundance of the 
AGC haplotype was high. It was revealed that allele 
A in query SNP 1s3746444 is more likely to be 
associated with allele G; while if the query allele is 
G, it is likely that the proxy allele would be C (Figure 
3). 


Haplotypes 


Haplotype Count 1579 365 41 


Haplotype Frequency 0.7951 0.1838 0.0206 


Figure 3. Haplotype Analysis of query SNP rs3746444 with two proxy SNP rs3746435 and rs6088678. 
Results obtained from haplotype study of SNPs using LDlink web-based tool indicate that when the query SNP is 
adenine, the proxy allele for rs3746435 and rs6088678 will be G and C, respectively. There is a very strong LD 


among these three SNPs (87%). 
(http://analysistools.nci.nih. gov/LDlink/tab=Idhap) 


http://jcmr.um.ac.ir 


Journal of Cell and Molecular Research (2020) 12 (1), 22-32 


Association of SNPs with transcriptional levels of 
the target genes 

Polymorphism 1s3746444- 
MYH7B/MIR499A induces a poor transcriptional 
level in the breast MEPs and VHMECs, which in 
turn, will be resulted in the formation of a weak 
polycomb complex and reduced regulatory effects of 
the target gene. On the other side, rs3746435 induces 


Genomic position: hg19 chr20:33591213-33680674 


a strong transcription in the examined cell lines. The 
proxy SNP rs6088678-TRPC4AP showed a strong 
transcription in addition to the score “1f” in both cell 
lines (Table 4). As demonstrated in Fig. 4, the non- 
coding proxy SNP rs6088678 with low score “lf”, 
indicated the highest expression level in the breast 
tissue. Its value was equal to 40-60 Reads Per 
kilobase Million (RPKM) (Figure 4). 


TRPC4AP Gene Expression from GTEx (Release V6 ) 


100 
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Figure 4. TRPC4AP gene expression from (GTEx) project for rs6088678. 
The non-coding proxy SNP rs6088678 with low score “1f’, indicates the highest expression level in the breast tissue 
(red arrow). Its value is equal to 40-60 Reads Per kilobase Million (RPKM). 


Table 4. Genome browser, chromatin state and accessibility. 
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Method SNP Location chromaun ene Tissue 
State Group 
Breast 
ChromHMM | 1s3746444 | chr20:33575200..33578600 bias Eaihenal| Oo 
transcription al Primary 
Cells 
Breast 
variant 
Weak Human 
ChromHMM | 1s3746444 | chr20:33574000..33583000 Repressed Epithelial Mammary 
PolyComb Epithelial 
Cells 
(vHMEC) 
Breast 
Strong a La Myoepitheli 
ChromHMM | rs3746435 | chr20:33583600..33645600 ie Epithelial : 
transcription al Primary 
Cells 
ChromHMM | rs3746435 | chr20:33583000..33590400 | Quiescent/Low | Epithelial VHMEC 
Breast 
Strong nee Myoepitheli 
ChromHMM | rs6088678 | chr20:33583600..33645600 ae Epithelial : 
transcription al Primary 
Cells 
ChromHMM | rs6088678 | chr20:33603600..33608000 Simone Epithelial | vHMEC 
transcription 


Journal of Cell and Molecular Research (2020) 12 (1), 22-32 


Breast 
Cine MEINE | PetoGIS77 1) PROT ee ee tOU | aieecen erica | pitieliad || Moo ee 
0 al Primary 

Cells 

Chrome. |-'$91060577. || <0 osONO toeee2) Wea Epithelial | vHMEC 

0 transcription 

Breast 
ChromHMM | rs1049174 | chr12:10524400..10528200 wet Epithelial || “eeren 
transcription al Primary 

Cells 

ChromHMM | 131049174 | chr12:10525000..10525600 | Enhancers | Epithelial | | vHMEC 
Breast 
ChromHMM | 182617160 | chr12:10544600..10549000 | Enhancers | Epithelial | Myoepitheli 
al Primary 

Cells 

ChromHMM | ,.2617160 | chr12:10544800..10546400 | Enhancers | Epithelial | vHMEC 


ChromHMM (Hidden Markov Model) is applied to annotate the non-coding genome using epigenomic information 
between one or multiple cell types. Using RegulomeDB web-based tool, the transcription level of Query SNPs and 
proxy SNPs in different tissues and cell types has been determined. It is necessary to mention that in all tables Query 


SNPs are displayed in bold. 


The NIH genotype-tissue expression (GTEXx) project 
was created to establish sample and data resources 
for studies aimed to unravel the relationships 
between genetic variations and gene expression 
levels in multiple human tissues. This track shows 
median gene expression levels in 51 tissues and 2 
cell lines, based on the RNA-seq data from the GTEx 
midpoint milestone data release (V6, October 2015). 
This release is formed based on the data from 8555 
tissue samples obtained from 570 adult post-mortem 
individuals. 

All the regulatory features which were seen in tables 
were obtained from ENCODE and NIH Roadmap 
Epigenomics data through the UCSC Genome 
Browser. 


SNP rs1062577 

Meta-analyses were not possible for query SNP 
rs1062577-ESR/ due to the limited number of 
studies which carried out on rs1062577 in the Asia. 
The interactive plot in LDlink tool revealed that 
there is no SNP with strong LD (= 0.8) around 
rs1062577. RegulomeDB score of "6" revealed no 
remarkable effect on the gene expression levels 
(Table 2). However, it shows the target gene (ESR/) 
expression of rs1062577 in the breast cancer tissues. 
There are overwhelming data on the expression of 
ESRI, up to about 50 RPKM. Indeed, this 
polymorphism induces H3K4mel1_Enh histone 
modification in VHMECs (Table 3). However, there 
was little evidences for rs1062577 to be a functional 
noncoding SNP. 


SNP rs1049174 versus rs2617160 
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The interactive plot from LDlink tool, 
indicated low density of SNPs with strong LD 
around rs1049174. The non-coding proxy SNP 
rs2617160, located in the intronic region with score 
"1d", was selected for further analysis (Table 2). 
There was no coding SNP with strong LD for 
rs1049174. Both of the query SNP rs1049174, and 
proxy SNP rs2617160 caused H3K4mel1_Enh 
histone modification in the investigated cell lines 
and were associated with the breast cancer tissue. 

In contrast, proxy SNP rs2617160, thorough 
the induction of motif changes, produces open 
chromatin regions in VHMECs. Hence, DNase I can 
cut DNA in its respective region (Table 3). Both 
rs1049174, and 2617160 which were submerged in 
the RegulomeDB tool in addition to the proxy SNP 
rs2617160 are located in RP/1-277P 12.20 enhancer 
sites of the examined breast cancer cell lines. 
rs1049174 caused a poor transcriptional level in the 
breast MEPs and is specifically located in the 
enhancer of the VHMECs (Table 4). 


Discussion 


Previous studies demonstrated that most of the 
GWAS variants fall in non-coding (nc) regions. The 
identification of the functions of these ncSNPs 
remains as a major challenge. The importance of 
understanding the functional contributions of 
specific risk variants to disease pathogenesis is 
widely accepted ( Rhie et al., 2013). The biological 
effects of the most already studied SNPs in the 
Iranian population were not strong. In the present 
study through the application of a set of in silico 
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approaches, functional analyses were performed for 
previously known breast cancer risk associated 
SNPs in the Iranian population. The HaploReg 
database was established as a computer simulation 
tool by Ward and Kellis (Ward et al., 2011) to 
provide an intersects of single nucleotide variants 
(SNVs) with chromatin status (Ernst et al., 2010). 
For the first time, this work demonstrated that a 
comprehensive in silico analysis of well-known 
ncSNPs and regulatory regions is essential before we 
can attribute them to the Iranian population. 

It was previously reported that rs3746444 ( 
Kabirizadeh et al., 2016), rs1062577 ( Dehghan et 
al., 2017), and rs1049174 ( Ghobadzadeh et al., 
2013) are associated with an increased risk of breast 
cancer in the Iranian population. We focused on non- 
coding proxy SNPs (LD=0.8 with query SNPs 
183746444, rs1062577, and 1049174) which were 
obtained from LDlink. It was assumed that all non- 
coding variant SNPs which are located in the 
regulatory regions (promoter, enhancer, 5°UTR, 
3°UTR) have a highly ranked RagulomeDB score 
(Table2). The meta-analysis of the rs3746444 in the 
Asian and Iranian population indicated a statistically 
significant relationship with the breast cancer by 
Odds Ratio(OR) = 1.15(1.09-1.22). These analyses 
were only possible for one SNP (Table 1). 
Moreover, the regulatory effects of 1s3746444- 
MYH7B/MIR499A, rs1062577-ESR/, and 
rs1049174-RP11-277P12.20 and their related proxy 
SNPs were determined based on the high LD. We 
apply this analysis to identify the most likely 
functional variant among MYH7B, ESRI, and RP11- 
277P12.20 genes. However, a solid framework of 
the functional significance of variants cannot be 
obtained by a single bioinformatics tool. Hence, 
some complementary tools were applied to perform 
the current study. Three computational-based tools 
including LDlink, HaploReg, and RegulomeDB 
were used for above mentioned SNPs in a 
combinatory mode to prioritized ncSNPs for their 
association with the disease status. The LD structure 
haplotype block for the Iranian population was not 
available because GWAS studies have not been 
performed previously in Iran. Hence, related 
information from the Asian population were utilized 
as a reference for LDlink studies. 

We identified query SNP rs1049174 in 3°UTR 
region as the only previously wet-lab studied SNP 
with high ranked RegulomeDB score “1d” and 
validated functional effects (eQTL+TF binding+any 
motif+ DNase peak) (Table 2). rs1049174 caused 
histone modification H3K4me1 in both cancerous 
cell lines. It confirms that these enhancers are ready 
to be active. 
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The present study demonstrated that SNPs in the 
MYH7B, TRPC4AP and RPJ1-277P12.20 genes 
(Table 2) in addition to the ncSNPs rs6088678, and 
rs2617160 are functionally important. Although, 
wet-lab experiments are essential for the validation 
of the results. Pairwise comparisons confirmed that 
intronic SNP rs6088678 (r? = 0.87 with rs3746444) 
and RegulomeDB score “If? showed more 
evidences of being functional in comparison to 
183746444 (Table 2). It was shown that the 
rs6088678 induced histone modification H3K9ac in 
the breast myoepithelial primary cells (Table 3). 
Due to our knowledge, this is the first association 
study between breast cancer susceptibility and 
polymorphisms of MYH7B, MIR499A, TRPC4AP, 
ESRI and RP/1-277P12.20 genes. These genes were 
selected using LDlink for the Iranian population. 
RegulomeDB is a powerful tool for the prediction of 
the regulatory potential of various variants. It is 
expected that the RegulomeDB web-based tool will 
be widely applied in the future for performing 
extensive association studies. 


Conclusion 


Considering the results of comparisons 
made in the present study which confirmed 
epigenetic properties for non-coding SNPs, the 
importance of these segments in the functional 
epigenetic studies were highlighted. Non-coding 
SNPs have a great impact on the binding capacity of 
regulatory proteins and gene expression pattern 
modifications as they can lead to _ histone 
modifications ( Khurana et al., 2016). In order to 
evaluate the possible functional properties of 
shortlisted SNPs in the Iranian population, in silico 
analyses using LDlink, RegulomeDB and HaploReg 
are strongly recommended. It could be expected that 
our computational model could prioritize variants in 
the regulatory regions. Thus, it helps researchers to 
figure out functional variants of noncoding regions 
with key effects in the pathogenesis of various 
diseases. 
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